AITopics | Western New Guinea

Collaborating Authors

Western New Guinea

ACADREASON: Exploring the Limits of Reasoning Models with Academic Research Problems

Gui, Xin, Zhu, King, Ren, JinCheng, Chen, Qianben, Wang, Zekun Moore, LI, Yizhi, Liu, Xinpeng, Li, Xiaowan, Ren, Wenli, Miao, Linyu, Qin, Tianrui, Shu, Ziqi, Zhu, He, Tang, Xiangru, Shi, Dingfeng, Liu, Jiaheng, Jiang, Yuchen Eleanor, Liu, Minghao, Zhang, Ge, Zhou, Wangchunshu

arXiv.org Artificial IntelligenceOct-14-2025

In recent years, the research focus of large language models (LLMs) and agents has shifted increasingly from demonstrating novel capabilities to complex reasoning and tackling challenging tasks. However, existing evaluations focus mainly on math/code contests or general tasks, while existing multi-domain academic benchmarks lack sufficient reasoning depth, leaving the field without a rigorous benchmark for high-level reasoning. To fill this gap, we introduce the Acadreason benchmark, designed to evaluate the ability of LLMs and agents to acquire and reason over academic knowledge. It consists of 50 expert-annotated academic problems across five high-reasoning domains, including computer science, economics, law, mathematics, and philosophy. All questions are sourced from top-tier publications in recent years and undergo rigorous annotation and quality control to ensure they are both challenging and answerable. We conduct systematic evaluations of over 10 mainstream LLMs and agents. The results show that most LLMs scored below 20 points, with even the cutting-edge GPT-5 achieving only 16 points. While agents achieved higher scores, none exceeded 40 points. This demonstrates the current capability gap between LLMs and agents in super-intelligent academic research tasks and highlights the challenges of Acadreason.

benchmark, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.11652

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Netherlands > South Holland > The Hague (0.04)
Europe > Monaco (0.04)
(2 more...)

Genre: Research Report > New Finding (0.87)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

If you can distinguish, you can express: Galois theory, Stone--Weierstrass, machine learning, and linguistics

Blum-Smith, Ben, Brugman, Claudia, Conners, Thomas, Villar, Soledad

arXiv.org Machine LearningOct-14-2025

This essay develops a parallel between the Fundamental Theorem of Galois Theory and the Stone--Weierstrass theorem: both can be viewed as assertions that tie the distinguishing power of a class of objects to their expressive power. We provide an elementary theorem connecting the relevant notions of "distinguishing power". We also discuss machine learning and data science contexts in which these theorems, and more generally the theme of links between distinguishing power and expressive power, appear. Finally, we discuss the same theme in the context of linguistics, where it appears as a foundational principle, and illustrate it with several examples.

artificial intelligence, invariant, machine learning, (15 more...)

arXiv.org Machine Learning

2510.09902

Country:

Asia > Indonesia > New Guinea > Western New Guinea > Papua (0.14)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
Oceania > Papua New Guinea (0.04)
(9 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

What Do Indonesians Really Need from Language Technology? A Nationwide Survey

Kautsar, Muhammad Dehan Al, Susanto, Lucky, Wijaya, Derry, Koto, Fajri

arXiv.org Artificial IntelligenceSep-30-2025

There is an emerging effort to develop NLP for Indonesias 700+ local languages, but progress remains costly due to the need for direct engagement with native speakers. However, it is unclear what these language communities truly need from language technology. To address this, we conduct a nationwide survey to assess the actual needs of native speakers in Indonesia. Our findings indicate that addressing language barriers, particularly through machine translation and information retrieval, is the most critical priority. Although there is strong enthusiasm for advancements in language technology, concerns around privacy, bias, and the use of public data for AI training highlight the need for greater transparency and clear communication to support broader AI adoption.

artificial intelligence, chatbot, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.07506

Country:

Asia > Indonesia > Sulawesi > South Sulawesi > Makassar (0.04)
North America > United States > California (0.04)
North America > Canada > Ontario > Toronto (0.04)
(34 more...)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government (1.00)
Education > Educational Setting (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)

Add feedback

Cross-Cultural Transfer of Commonsense Reasoning in LLMs: Evidence from the Arab World

Almheiri, Saeed, Hossam, Rania, Attia, Mena, Wang, Chenxi, Nakov, Preslav, Baldwin, Timothy, Koto, Fajri

arXiv.org Artificial IntelligenceSep-24-2025

Large language models (LLMs) often reflect Western-centric biases, limiting their effectiveness in diverse cultural contexts. Although some work has explored cultural alignment, the potential for cross-cultural transfer, using alignment in one culture to improve performance in others, remains underexplored. This paper investigates cross-cultural transfer of commonsense reasoning in the Arab world, where linguistic and historical similarities coexist with local cultural differences. Using a culturally grounded commonsense reasoning dataset covering 13 Arab countries, we evaluate lightweight alignment methods such as in-context learning and demonstration-based reinforcement (DITTO), alongside baselines like supervised fine-tuning and direct preference optimization. Our results show that merely 12 culture-specific examples from one country can improve performance in others by 10\% on average, within multilingual models. In addition, we demonstrate that out-of-culture demonstrations from Indonesia and US contexts can match or surpass in-culture alignment for MCQ reasoning, highlighting cultural commonsense transferability beyond the Arab world. These findings demonstrate that efficient cross-cultural alignment is possible and offer a promising approach to adapt LLMs to low-resource cultural settings.

demonstration, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.19265

Country:

Africa > Middle East > Egypt (0.15)
Europe > Austria > Vienna (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
(18 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.81)

Add feedback

Enhancing Poverty Targeting with Spatial Machine Learning: An application to Indonesia

Martinez, Rolando Gonzales, Cooray, Mariza

arXiv.org Machine LearningMar-6-2025

This study leverages spatial machine learning (SML) to enhance the accuracy of Proxy Means Testing (PMT) for poverty targeting in Indonesia. Conventional PMT methodologies are prone to exclusion and inclusion errors due to their inability to account for spatial dependencies and regional heterogeneity. By integrating spatial contiguity matrices, SML models mitigate these limitations, facilitating a more precise identification and comparison of geographical poverty clusters. Utilizing household survey data from the Social Welfare Integrated Data Survey (DTKS) for the periods 2016 to 2020 and 2016 to 2021, this study examines spatial patterns in income distribution and delineates poverty clusters at both provincial and district levels. Empirical findings indicate that the proposed SML approach reduces exclusion errors from 28% to 20% compared to standard machine learning models, underscoring the critical role of spatial analysis in refining machine learning-based poverty targeting. These results highlight the potential of SML to inform the design of more equitable and effective social protection policies, particularly in geographically diverse contexts. Future research can explore the applicability of spatiotemporal models and assess the generalizability of SML approaches across varying socio-economic settings.

exclusion error, inclusion error, spatial machine, (12 more...)

arXiv.org Machine Learning

2503.043

Country:

North America > United States (0.05)
Asia > Indonesia > Nusa Tenggara Islands (0.05)
Asia > Indonesia > Sumatra > Bengkulu > Bengkulu (0.04)
(17 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Add feedback

CROPE: Evaluating In-Context Adaptation of Vision and Language Models to Culture-Specific Concepts

Nikandrou, Malvina, Pantazopoulos, Georgios, Vitsakis, Nikolas, Konstas, Ioannis, Suglia, Alessandro

arXiv.org Artificial IntelligenceOct-20-2024

As Vision and Language models (VLMs) become accessible across the globe, it is important that they demonstrate cultural knowledge. In this paper, we introduce CROPE, a visual question answering benchmark designed to probe the knowledge of culture-specific concepts and evaluate the capacity for cultural adaptation through contextual information. This allows us to distinguish between parametric knowledge acquired during training and contextual knowledge provided during inference via visual and textual descriptions. Our evaluation of several state-of-the-art open VLMs shows large performance disparities between culture-specific and common concepts in the parametric setting. Moreover, experiments with contextual knowledge indicate that models struggle to effectively utilize multimodal information and bind culture-specific concepts to their depictions. Our findings reveal limitations in the cultural understanding and adaptability of current VLMs that need to be addressed toward more culturally inclusive models.

arxiv preprint arxiv, computational linguistic, proceedings, (13 more...)

arXiv.org Artificial Intelligence

2410.15453

Country:

North America > Dominican Republic (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > Middle East > Jordan (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?

Aycock, Seth, Stap, David, Wu, Di, Monz, Christof, Sima'an, Khalil

arXiv.org Artificial IntelligenceSep-27-2024

Extremely low-resource (XLR) languages lack substantial corpora for training NLP models, motivating the use of all available resources such as dictionaries and grammar books. Machine Translation from One Book (Tanzer et al., 2024) suggests prompting long-context LLMs with one grammar book enables English-Kalamang translation, an unseen XLR language - a noteworthy case of linguistic knowledge helping an NLP task. We investigate whether the book's grammatical explanations or its parallel examples are most effective for learning XLR translation, finding almost all improvement stems from the parallel examples. Further, we find similar results for Nepali, a seen low-resource language, and achieve performance comparable to an LLM with a grammar book by simply fine-tuning an encoder-decoder translation model. We then investigate where grammar books help by testing two linguistic tasks, grammaticality judgment and gloss prediction, and we explore what kind of grammatical knowledge helps by introducing a typological feature prompt that achieves leading results on these more relevant tasks. We thus emphasise the importance of task-appropriate data for XLR languages: parallel examples for translation, and grammatical data for linguistic tasks. As we find no evidence that long-context LLMs can make effective use of grammatical explanations for XLR translation, we suggest data collection for multilingual XLR tasks such as translation is best focused on parallel data over linguistic description.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2409.19151

Country:

North America > Canada > Ontario > Toronto (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
(22 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

You are what you eat? Feeding foundation models a regionally diverse food dataset of World Wide Dishes

Magomere, Jabez, Ishida, Shu, Afonja, Tejumade, Salama, Aya, Kochin, Daniel, Yuehgoh, Foutse, Hamzaoui, Imane, Sefala, Raesetje, Alaagib, Aisha, Semenova, Elizaveta, Crais, Lauren, Hall, Siobhan Mackenzie

arXiv.org Artificial IntelligenceJun-13-2024

Foundation models are increasingly ubiquitous in our daily lives, used in everyday tasks such as text-image searches, interactions with chatbots, and content generation. As use increases, so does concern over the disparities in performance and fairness of these models for different people in different parts of the world. To assess these growing regional disparities, we present World Wide Dishes, a mixed text and image dataset consisting of 765 dishes, with dish names collected in 131 local languages. World Wide Dishes has been collected purely through human contribution and decentralised means, by creating a website widely distributed through social networks. Using the dataset, we demonstrate a novel means of operationalising capability and representational biases in foundation models such as language models and text-to-image generative models. We enrich these studies with a pilot community review to understand, from a first-person perspective, how these models generate images for people in five African countries and the United States. We find that these models generally do not produce quality text and image outputs of dishes specific to different regions. This is true even for the US, which is typically considered to be more well-resourced in training data - though the generation of US dishes does outperform that of the investigated African countries. The models demonstrate a propensity to produce outputs that are inaccurate as well as culturally misrepresentative, flattening, and insensitive. These failures in capability and representational bias have the potential to further reinforce stereotypes and disproportionately contribute to erasure based on region. The dataset and code are available at https://github.com/oxai/world-wide-dishes/.

dall-e 2, dataset, information, (16 more...)

arXiv.org Artificial Intelligence

2406.09496

Country:

North America > United States (0.88)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Africa > Democratic Republic of the Congo (0.14)
(98 more...)

Genre: Research Report > New Finding (0.45)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.92)
Government (0.92)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.52)

Add feedback

Synergetic Event Understanding: A Collaborative Approach to Cross-Document Event Coreference Resolution with Large Language Models

Min, Qingkai, Guo, Qipeng, Hu, Xiangkun, Huang, Songfang, Zhang, Zheng, Zhang, Yue

arXiv.org Artificial IntelligenceJun-4-2024

Cross-document event coreference resolution (CDECR) involves clustering event mentions across multiple documents that refer to the same real-world events. Existing approaches utilize fine-tuning of small language models (SLMs) like BERT to address the compatibility among the contexts of event mentions. However, due to the complexity and diversity of contexts, these models are prone to learning simple co-occurrences. Recently, large language models (LLMs) like ChatGPT have demonstrated impressive contextual understanding, yet they encounter challenges in adapting to specific information extraction (IE) tasks. In this paper, we propose a collaborative approach for CDECR, leveraging the capabilities of both a universally capable LLM and a task-specific SLM. The collaborative strategy begins with the LLM accurately and comprehensively summarizing events through prompting. Then, the SLM refines its learning of event representations based on these insights during fine-tuning. Experimental results demonstrate that our approach surpasses the performance of both the large and small language models individually, forming a complementary advantage. Across various datasets, our approach achieves state-of-the-art performance, underscoring its effectiveness in diverse scenarios.

computational linguistic, coreference resolution, event mention, (14 more...)

arXiv.org Artificial Intelligence

2406.02148

Country:

Asia > Singapore (0.05)
North America > Canada > Ontario > Toronto (0.04)
Asia > Indonesia > New Guinea > Western New Guinea > Papua (0.04)
(18 more...)

Genre: Research Report > New Finding (0.88)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

An Open-Source Reproducible Chess Robot for Human-Robot Interaction Research

Zhang, Renchi, de Winter, Joost, Dodou, Dimitra, Seyffert, Harleigh, Eisma, Yke Bauke

arXiv.org Artificial IntelligenceMay-28-2024

Recent advancements in AI have sped up the evolution of versatile robot designs. Chess provides a standardized environment that allows for the evaluation of the influence of robot behaviors on human behavior. This article presents an open-source chess robot for humanrobot interaction (HRI) research, specifically focusing on verbal and non-verbal interactions. OpenChessRobot recognizes chess pieces using computer vision, executes moves, and interacts with the human player using voice and robotic gestures. We detail the software design, provide quantitative evaluations of the robot's efficacy and offer a guide for its reproducibility. Keywords: Artificial Intelligence, Chess, Human-robot Interaction, Open-source, Transfer Learning 1. Introduction Robots are becoming increasingly common across a variety of traditionally human-controlled domains. Examples range from automated mowers that maintain community lawns to robots in assembly lines and agricultural settings. Recent scientific advancements in AI have enabled new opportunities for intelligent sensing, reasoning, and acting by robots. In particular, the rapid development of large language models, such as ChatGPT, and vision-language models, have lowered the barrier of human-to-robot communication by being able to transform text and images into interpretable actions or vice versa. As technology advances, it is likely that robots will attain greater capabilities and will be able to tackle tasks previously within the exclusive realm of human expertise. This ongoing evolution may also lead to closer and more productive interactions between humans and robots. At the same time, integrating different AI-based robotic components remains a challenge, and the human-robot interaction (HRI) field lags in terms of endorsing reproducibility principles (Gunes et al., 2022). Encouraging transparent and reproducible research, therefore, remains an ongoing task. Furthermore, chess has played an important role in advancing the field of AI, starting with Claude Shannon's chess-playing algorithm (Shannon, 1950) to the success of IBM's Deep Blue (Campbell et al., 2002) and DeepMind's self-play learning algorithm (Silver et al., 2018). In this paper, we incorporate modern AI algorithms into the design of a chess-playing robot to be used for studying HRI. HRI research may benefit from a chess-based setup because the game of chess provides a controlled rule-based environment in which the impact of robots on human players can be precisely measured.

dataset, interaction, robot, (16 more...)

arXiv.org Artificial Intelligence

2405.1817

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
(15 more...)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Chess (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.91)
Information Technology > Artificial Intelligence > Games > Chess (0.91)

Add feedback